Overview

Dataset statistics

Number of variables17
Number of observations2591730
Missing cells12958650
Missing cells (%)29.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 GiB
Average record size in memory512.2 B

Variable types

Categorical3
Text4
Numeric5
Unsupported5

Alerts

REF_DATE has constant value ""Constant
Knowledge of official languages (5):Total - Knowledge of official languages[1] is highly overall correlated with Knowledge of official languages (5):English only[2] and 2 other fieldsHigh correlation
Knowledge of official languages (5):English only[2] is highly overall correlated with Knowledge of official languages (5):Total - Knowledge of official languages[1] and 2 other fieldsHigh correlation
Knowledge of official languages (5):French only[3] is highly overall correlated with Knowledge of official languages (5):English and French[4]High correlation
Knowledge of official languages (5):English and French[4] is highly overall correlated with Knowledge of official languages (5):Total - Knowledge of official languages[1] and 3 other fieldsHigh correlation
Knowledge of official languages (5):Neither English nor French[5] is highly overall correlated with Knowledge of official languages (5):Total - Knowledge of official languages[1] and 2 other fieldsHigh correlation
Symbol has 2591730 (100.0%) missing valuesMissing
Symbol.1 has 2591730 (100.0%) missing valuesMissing
Symbol.2 has 2591730 (100.0%) missing valuesMissing
Symbol.3 has 2591730 (100.0%) missing valuesMissing
Symbol.4 has 2591730 (100.0%) missing valuesMissing
Knowledge of official languages (5):Total - Knowledge of official languages[1] is highly skewed (γ1 = 270.4104924)Skewed
Knowledge of official languages (5):English only[2] is highly skewed (γ1 = 259.6271826)Skewed
Knowledge of official languages (5):French only[3] is highly skewed (γ1 = 249.2697714)Skewed
Knowledge of official languages (5):English and French[4] is highly skewed (γ1 = 235.2014829)Skewed
Knowledge of official languages (5):Neither English nor French[5] is highly skewed (γ1 = 204.0238494)Skewed
Gender (3) is uniformly distributedUniform
Age (15A) is uniformly distributedUniform
Coordinate has unique valuesUnique
Symbol is an unsupported type, check if it needs cleaning or further analysisUnsupported
Symbol.1 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Symbol.2 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Symbol.3 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Symbol.4 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Knowledge of official languages (5):Total - Knowledge of official languages[1] has 2090181 (80.6%) zerosZeros
Knowledge of official languages (5):English only[2] has 2157937 (83.3%) zerosZeros
Knowledge of official languages (5):French only[3] has 2513182 (97.0%) zerosZeros
Knowledge of official languages (5):English and French[4] has 2361395 (91.1%) zerosZeros
Knowledge of official languages (5):Neither English nor French[5] has 2451876 (94.6%) zerosZeros

Reproduction

Analysis started2023-11-04 15:59:46.422832
Analysis finished2023-11-04 16:00:37.683197
Duration51.26 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size150.8 MiB
2021
2591730 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters10366920
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021
2nd row2021
3rd row2021
4th row2021
5th row2021

Common Values

ValueCountFrequency (%)
2021 2591730
100.0%

Length

2023-11-04T12:00:37.762760image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-04T12:00:37.885677image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
2021 2591730
100.0%

Most occurring characters

ValueCountFrequency (%)
2 5183460
50.0%
0 2591730
25.0%
1 2591730
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10366920
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 5183460
50.0%
0 2591730
25.0%
1 2591730
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10366920
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 5183460
50.0%
0 2591730
25.0%
1 2591730
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10366920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 5183460
50.0%
0 2591730
25.0%
1 2591730
25.0%

GEO
Text

Distinct174
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size198.2 MiB
2023-11-04T12:00:38.009259image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length44
Median length35
Mean length21.827586
Min length5

Characters and Unicode

Total characters56571210
Distinct characters58
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada
ValueCountFrequency (%)
ca 1742715
20.3%
ont 640485
 
7.5%
cma 640485
 
7.5%
que 476640
 
5.5%
b.c 417060
 
4.9%
alta 253215
 
2.9%
sask 148950
 
1.7%
part 119160
 
1.4%
119160
 
1.4%
n.b 104265
 
1.2%
Other values (216) 3932280
45.8%
2023-11-04T12:00:38.266458image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6002685
 
10.6%
. 3276900
 
5.8%
C 3187530
 
5.6%
a 3142845
 
5.6%
e 3008790
 
5.3%
A 2785365
 
4.9%
t 2725785
 
4.8%
n 2636415
 
4.7%
( 2502360
 
4.4%
) 2502360
 
4.4%
Other values (48) 24800175
43.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 26989740
47.7%
Uppercase Letter 12422430
22.0%
Space Separator 6002685
 
10.6%
Other Punctuation 5749470
 
10.2%
Open Punctuation 2502360
 
4.4%
Close Punctuation 2502360
 
4.4%
Dash Punctuation 402165
 
0.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 3142845
11.6%
e 3008790
11.1%
t 2725785
10.1%
n 2636415
9.8%
r 2159775
 
8.0%
o 2040615
 
7.6%
i 1727820
 
6.4%
l 1608660
 
6.0%
u 1266075
 
4.7%
s 1161810
 
4.3%
Other values (16) 5511150
20.4%
Uppercase Letter
ValueCountFrequency (%)
C 3187530
25.7%
A 2785365
22.4%
M 938385
 
7.6%
O 834120
 
6.7%
B 804330
 
6.5%
S 700065
 
5.6%
Q 625590
 
5.0%
N 476640
 
3.8%
L 268110
 
2.2%
P 253215
 
2.0%
Other values (14) 1549080
12.5%
Other Punctuation
ValueCountFrequency (%)
. 3276900
57.0%
, 2383200
41.5%
/ 59580
 
1.0%
' 29790
 
0.5%
Space Separator
ValueCountFrequency (%)
6002685
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2502360
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2502360
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 402165
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 39412170
69.7%
Common 17159040
30.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 3187530
 
8.1%
a 3142845
 
8.0%
e 3008790
 
7.6%
A 2785365
 
7.1%
t 2725785
 
6.9%
n 2636415
 
6.7%
r 2159775
 
5.5%
o 2040615
 
5.2%
i 1727820
 
4.4%
l 1608660
 
4.1%
Other values (40) 14388570
36.5%
Common
ValueCountFrequency (%)
6002685
35.0%
. 3276900
19.1%
( 2502360
14.6%
) 2502360
14.6%
, 2383200
 
13.9%
- 402165
 
2.3%
/ 59580
 
0.3%
' 29790
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 56496735
99.9%
None 74475
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6002685
 
10.6%
. 3276900
 
5.8%
C 3187530
 
5.6%
a 3142845
 
5.6%
e 3008790
 
5.3%
A 2785365
 
4.9%
t 2725785
 
4.8%
n 2636415
 
4.7%
( 2502360
 
4.4%
) 2502360
 
4.4%
Other values (45) 24725700
43.8%
None
ValueCountFrequency (%)
é 29790
40.0%
è 29790
40.0%
ÃŽ 14895
20.0%

DGUID
Text

Distinct174
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size170.6 MiB
2023-11-04T12:00:38.414662image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length14
Median length12
Mean length12.028736
Min length11

Characters and Unicode

Total characters31175235
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021A000011124
2nd row2021A000011124
3rd row2021A000011124
4th row2021A000011124
5th row2021A000011124
ValueCountFrequency (%)
2021a000011124 14895
 
0.6%
2021a000212 14895
 
0.6%
2021s0504225 14895
 
0.6%
2021s0504015 14895
 
0.6%
2021s0504011 14895
 
0.6%
2021s0504010 14895
 
0.6%
2021s0503001 14895
 
0.6%
2021a000211 14895
 
0.6%
2021s0504105 14895
 
0.6%
2021s0504110 14895
 
0.6%
Other values (164) 2442780
94.3%
2023-11-04T12:00:38.675411image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 9026370
29.0%
2 6121845
19.6%
5 3976965
12.8%
1 3262005
 
10.5%
4 2591730
 
8.3%
S 2383200
 
7.6%
3 1429920
 
4.6%
9 625590
 
2.0%
6 595800
 
1.9%
8 551115
 
1.8%
Other values (2) 610695
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 28583505
91.7%
Uppercase Letter 2591730
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 9026370
31.6%
2 6121845
21.4%
5 3976965
13.9%
1 3262005
 
11.4%
4 2591730
 
9.1%
3 1429920
 
5.0%
9 625590
 
2.2%
6 595800
 
2.1%
8 551115
 
1.9%
7 402165
 
1.4%
Uppercase Letter
ValueCountFrequency (%)
S 2383200
92.0%
A 208530
 
8.0%

Most occurring scripts

ValueCountFrequency (%)
Common 28583505
91.7%
Latin 2591730
 
8.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 9026370
31.6%
2 6121845
21.4%
5 3976965
13.9%
1 3262005
 
11.4%
4 2591730
 
9.1%
3 1429920
 
5.0%
9 625590
 
2.2%
6 595800
 
2.1%
8 551115
 
1.9%
7 402165
 
1.4%
Latin
ValueCountFrequency (%)
S 2383200
92.0%
A 208530
 
8.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 31175235
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 9026370
29.0%
2 6121845
19.6%
5 3976965
12.8%
1 3262005
 
10.5%
4 2591730
 
8.3%
S 2383200
 
7.6%
3 1429920
 
4.6%
9 625590
 
2.0%
6 595800
 
1.9%
8 551115
 
1.8%
Other values (2) 610695
 
2.0%

Gender (3)
Categorical

UNIFORM 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size160.7 MiB
Total - Gender
863910 
Men+
863910 
Women+
863910 

Length

Max length14
Median length6
Mean length8
Min length4

Characters and Unicode

Total characters20733840
Distinct characters16
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal - Gender
2nd rowTotal - Gender
3rd rowTotal - Gender
4th rowTotal - Gender
5th rowTotal - Gender

Common Values

ValueCountFrequency (%)
Total - Gender 863910
33.3%
Men+ 863910
33.3%
Women+ 863910
33.3%

Length

2023-11-04T12:00:38.786773image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-04T12:00:38.946813image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
total 863910
20.0%
863910
20.0%
gender 863910
20.0%
men 863910
20.0%
women 863910
20.0%

Most occurring characters

ValueCountFrequency (%)
e 3455640
16.7%
n 2591730
12.5%
o 1727820
 
8.3%
1727820
 
8.3%
+ 1727820
 
8.3%
T 863910
 
4.2%
t 863910
 
4.2%
a 863910
 
4.2%
l 863910
 
4.2%
- 863910
 
4.2%
Other values (6) 5183460
25.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12958650
62.5%
Uppercase Letter 3455640
 
16.7%
Space Separator 1727820
 
8.3%
Math Symbol 1727820
 
8.3%
Dash Punctuation 863910
 
4.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 3455640
26.7%
n 2591730
20.0%
o 1727820
13.3%
t 863910
 
6.7%
a 863910
 
6.7%
l 863910
 
6.7%
d 863910
 
6.7%
r 863910
 
6.7%
m 863910
 
6.7%
Uppercase Letter
ValueCountFrequency (%)
T 863910
25.0%
G 863910
25.0%
M 863910
25.0%
W 863910
25.0%
Space Separator
ValueCountFrequency (%)
1727820
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1727820
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 863910
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 16414290
79.2%
Common 4319550
 
20.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 3455640
21.1%
n 2591730
15.8%
o 1727820
10.5%
T 863910
 
5.3%
t 863910
 
5.3%
a 863910
 
5.3%
l 863910
 
5.3%
G 863910
 
5.3%
d 863910
 
5.3%
r 863910
 
5.3%
Other values (3) 2591730
15.8%
Common
ValueCountFrequency (%)
1727820
40.0%
+ 1727820
40.0%
- 863910
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20733840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 3455640
16.7%
n 2591730
12.5%
o 1727820
 
8.3%
1727820
 
8.3%
+ 1727820
 
8.3%
T 863910
 
4.2%
t 863910
 
4.2%
a 863910
 
4.2%
l 863910
 
4.2%
- 863910
 
4.2%
Other values (6) 5183460
25.0%

Age (15A)
Categorical

UNIFORM 

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size174.7 MiB
Total - Age
 
172782
0 to 14 years
 
172782
0 to 4 years
 
172782
5 to 9 years
 
172782
10 to 14 years
 
172782
Other values (10)
1727820 

Length

Max length17
Median length14
Mean length13.666667
Min length11

Characters and Unicode

Total characters35420310
Distinct characters25
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal - Age
2nd rowTotal - Age
3rd rowTotal - Age
4th rowTotal - Age
5th rowTotal - Age

Common Values

ValueCountFrequency (%)
Total - Age 172782
 
6.7%
0 to 14 years 172782
 
6.7%
0 to 4 years 172782
 
6.7%
5 to 9 years 172782
 
6.7%
10 to 14 years 172782
 
6.7%
15 to 24 years 172782
 
6.7%
15 to 19 years 172782
 
6.7%
20 to 24 years 172782
 
6.7%
25 to 64 years 172782
 
6.7%
25 to 34 years 172782
 
6.7%
Other values (5) 863910
33.3%

Length

2023-11-04T12:00:39.024780image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
years 2418948
23.7%
to 2246166
22.0%
15 345564
 
3.4%
64 345564
 
3.4%
0 345564
 
3.4%
14 345564
 
3.4%
25 345564
 
3.4%
24 345564
 
3.4%
total 172782
 
1.7%
45 172782
 
1.7%
Other values (18) 3110076
30.5%

Most occurring characters

ValueCountFrequency (%)
7602408
21.5%
e 2764512
 
7.8%
a 2764512
 
7.8%
o 2591730
 
7.3%
r 2591730
 
7.3%
t 2418948
 
6.8%
s 2418948
 
6.8%
y 2418948
 
6.8%
4 2246166
 
6.3%
5 2073384
 
5.9%
Other values (15) 5529024
15.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 18833238
53.2%
Decimal Number 8466318
23.9%
Space Separator 7602408
21.5%
Uppercase Letter 345564
 
1.0%
Dash Punctuation 172782
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2764512
14.7%
a 2764512
14.7%
o 2591730
13.8%
r 2591730
13.8%
t 2418948
12.8%
s 2418948
12.8%
y 2418948
12.8%
n 172782
 
0.9%
d 172782
 
0.9%
g 172782
 
0.9%
Other values (2) 345564
 
1.8%
Decimal Number
ValueCountFrequency (%)
4 2246166
26.5%
5 2073384
24.5%
1 1036692
12.2%
2 863910
 
10.2%
0 691128
 
8.2%
6 518346
 
6.1%
3 345564
 
4.1%
7 345564
 
4.1%
9 345564
 
4.1%
Uppercase Letter
ValueCountFrequency (%)
T 172782
50.0%
A 172782
50.0%
Space Separator
ValueCountFrequency (%)
7602408
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 172782
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 19178802
54.1%
Common 16241508
45.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2764512
14.4%
a 2764512
14.4%
o 2591730
13.5%
r 2591730
13.5%
t 2418948
12.6%
s 2418948
12.6%
y 2418948
12.6%
n 172782
 
0.9%
d 172782
 
0.9%
T 172782
 
0.9%
Other values (4) 691128
 
3.6%
Common
ValueCountFrequency (%)
7602408
46.8%
4 2246166
 
13.8%
5 2073384
 
12.8%
1 1036692
 
6.4%
2 863910
 
5.3%
0 691128
 
4.3%
6 518346
 
3.2%
3 345564
 
2.1%
7 345564
 
2.1%
9 345564
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 35420310
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7602408
21.5%
e 2764512
 
7.8%
a 2764512
 
7.8%
o 2591730
 
7.3%
r 2591730
 
7.3%
t 2418948
 
6.8%
s 2418948
 
6.8%
y 2418948
 
6.8%
4 2246166
 
6.3%
5 2073384
 
5.9%
Other values (15) 5529024
15.6%
Distinct331
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size177.6 MiB
2023-11-04T12:00:39.202744image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length46
Median length30
Mean length14.661631
Min length2

Characters and Unicode

Total characters37998990
Distinct characters61
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal - Mother tongue
2nd rowSingle responses
3rd rowOfficial languages
4th rowEnglish
5th rowFrench
ValueCountFrequency (%)
languages 743850
 
16.1%
n.i.e 266220
 
5.8%
n.o.s 117450
 
2.5%
cree 62640
 
1.4%
german 39150
 
0.8%
english 39150
 
0.8%
creole 39150
 
0.8%
non-official 39150
 
0.8%
sign 31320
 
0.7%
tutchone 31320
 
0.7%
Other values (340) 3210300
69.5%
2023-11-04T12:00:39.498630image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 4838940
 
12.7%
n 3734910
 
9.8%
i 2826630
 
7.4%
e 2795310
 
7.4%
2027970
 
5.3%
g 2027970
 
5.3%
s 1855710
 
4.9%
l 1714770
 
4.5%
u 1675620
 
4.4%
o 1464210
 
3.9%
Other values (51) 13036950
34.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 29479950
77.6%
Uppercase Letter 3664440
 
9.6%
Space Separator 2027970
 
5.3%
Other Punctuation 1659960
 
4.4%
Close Punctuation 399330
 
1.1%
Open Punctuation 399330
 
1.1%
Dash Punctuation 368010
 
1.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 4838940
16.4%
n 3734910
12.7%
i 2826630
9.6%
e 2795310
9.5%
g 2027970
 
6.9%
s 1855710
 
6.3%
l 1714770
 
5.8%
u 1675620
 
5.7%
o 1464210
 
5.0%
r 1166670
 
4.0%
Other values (18) 5379210
18.2%
Uppercase Letter
ValueCountFrequency (%)
S 469800
12.8%
C 321030
 
8.8%
I 305370
 
8.3%
A 297540
 
8.1%
T 289710
 
7.9%
K 211410
 
5.8%
N 203580
 
5.6%
M 203580
 
5.6%
B 148770
 
4.1%
P 148770
 
4.1%
Other values (16) 1064880
29.1%
Other Punctuation
ValueCountFrequency (%)
. 1143180
68.9%
, 446310
 
26.9%
' 70470
 
4.2%
Space Separator
ValueCountFrequency (%)
2027970
100.0%
Close Punctuation
ValueCountFrequency (%)
) 399330
100.0%
Open Punctuation
ValueCountFrequency (%)
( 399330
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 368010
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 33144390
87.2%
Common 4854600
 
12.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 4838940
14.6%
n 3734910
11.3%
i 2826630
 
8.5%
e 2795310
 
8.4%
g 2027970
 
6.1%
s 1855710
 
5.6%
l 1714770
 
5.2%
u 1675620
 
5.1%
o 1464210
 
4.4%
r 1166670
 
3.5%
Other values (44) 9043650
27.3%
Common
ValueCountFrequency (%)
2027970
41.8%
. 1143180
23.5%
, 446310
 
9.2%
) 399330
 
8.2%
( 399330
 
8.2%
- 368010
 
7.6%
' 70470
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 37967670
99.9%
None 31320
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 4838940
 
12.7%
n 3734910
 
9.8%
i 2826630
 
7.4%
e 2795310
 
7.4%
2027970
 
5.3%
g 2027970
 
5.3%
s 1855710
 
4.9%
l 1714770
 
4.5%
u 1675620
 
4.4%
o 1464210
 
3.9%
Other values (48) 13005630
34.3%
None
ValueCountFrequency (%)
é 15660
50.0%
É 7830
25.0%
ò 7830
25.0%

Coordinate
Text

UNIQUE 

Distinct2591730
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size166.7 MiB
2023-11-04T12:00:41.136565image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length12
Median length11
Mean length10.453026
Min length7

Characters and Unicode

Total characters27091422
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2591730 ?
Unique (%)100.0%

Sample

1st row1.1.1.1
2nd row1.1.1.2
3rd row1.1.1.3
4th row1.1.1.4
5th row1.1.1.5
ValueCountFrequency (%)
1.1.1.1 1
 
< 0.1%
1.1.1.6 1
 
< 0.1%
1.1.1.22 1
 
< 0.1%
1.1.1.20 1
 
< 0.1%
1.1.1.78 1
 
< 0.1%
1.1.1.10 1
 
< 0.1%
1.1.1.3 1
 
< 0.1%
1.1.1.4 1
 
< 0.1%
1.1.1.5 1
 
< 0.1%
1.1.1.7 1
 
< 0.1%
Other values (2591720) 2591720
> 99.9%
2023-11-04T12:00:42.816959image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 7775190
28.7%
1 5291721
19.5%
2 3130074
11.6%
3 2534994
 
9.4%
4 1404864
 
5.2%
5 1389969
 
5.1%
6 1217187
 
4.5%
7 1142712
 
4.2%
8 1068237
 
3.9%
9 1068237
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 19316232
71.3%
Other Punctuation 7775190
28.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 5291721
27.4%
2 3130074
16.2%
3 2534994
13.1%
4 1404864
 
7.3%
5 1389969
 
7.2%
6 1217187
 
6.3%
7 1142712
 
5.9%
8 1068237
 
5.5%
9 1068237
 
5.5%
0 1068237
 
5.5%
Other Punctuation
ValueCountFrequency (%)
. 7775190
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 27091422
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 7775190
28.7%
1 5291721
19.5%
2 3130074
11.6%
3 2534994
 
9.4%
4 1404864
 
5.2%
5 1389969
 
5.1%
6 1217187
 
4.5%
7 1142712
 
4.2%
8 1068237
 
3.9%
9 1068237
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27091422
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 7775190
28.7%
1 5291721
19.5%
2 3130074
11.6%
3 2534994
 
9.4%
4 1404864
 
5.2%
5 1389969
 
5.1%
6 1217187
 
4.5%
7 1142712
 
4.2%
8 1068237
 
3.9%
9 1068237
 
3.9%

Knowledge of official languages (5):Total - Knowledge of official languages[1]
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct12698
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1067.7934
Minimum0
Maximum36620955
Zeros2090181
Zeros (%)80.6%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2023-11-04T12:00:42.925572image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile145
Maximum36620955
Range36620955
Interquartile range (IQR)0

Descriptive statistics

Standard deviation65127.087
Coefficient of variation (CV)60.992215
Kurtosis108736.73
Mean1067.7934
Median Absolute Deviation (MAD)0
Skewness270.41049
Sum2.7674323 × 109
Variance4.2415375 × 109
MonotonicityNot monotonic
2023-11-04T12:00:43.016334image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2090181
80.6%
5 138734
 
5.4%
10 56122
 
2.2%
15 32646
 
1.3%
20 22668
 
0.9%
25 17008
 
0.7%
30 13478
 
0.5%
35 10818
 
0.4%
40 8975
 
0.3%
45 7717
 
0.3%
Other values (12688) 193383
 
7.5%
ValueCountFrequency (%)
0 2090181
80.6%
5 138734
 
5.4%
10 56122
 
2.2%
15 32646
 
1.3%
20 22668
 
0.9%
25 17008
 
0.7%
30 13478
 
0.5%
35 10818
 
0.4%
40 8975
 
0.3%
45 7717
 
0.3%
ValueCountFrequency (%)
36620955 1
< 0.1%
35145265 1
< 0.1%
27296445 1
< 0.1%
20107200 1
< 0.1%
19646805 1
< 0.1%
18866595 1
< 0.1%
18557080 1
< 0.1%
18063870 1
< 0.1%
17800600 1
< 0.1%
17344660 1
< 0.1%

Symbol
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2591730
Missing (%)100.0%
Memory size19.8 MiB

Knowledge of official languages (5):English only[2]
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct10817
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean749.97096
Minimum0
Maximum25261655
Zeros2157937
Zeros (%)83.3%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2023-11-04T12:00:43.112205image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile85
Maximum25261655
Range25261655
Interquartile range (IQR)0

Descriptive statistics

Standard deviation47375.916
Coefficient of variation (CV)63.170333
Kurtosis97219.603
Mean749.97096
Median Absolute Deviation (MAD)0
Skewness259.62718
Sum1.9437222 × 109
Variance2.2444774 × 109
MonotonicityNot monotonic
2023-11-04T12:00:43.207111image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2157937
83.3%
5 124602
 
4.8%
10 49527
 
1.9%
15 28415
 
1.1%
20 19617
 
0.8%
25 14728
 
0.6%
30 11541
 
0.4%
35 9212
 
0.4%
40 7737
 
0.3%
45 6681
 
0.3%
Other values (10807) 161733
 
6.2%
ValueCountFrequency (%)
0 2157937
83.3%
5 124602
 
4.8%
10 49527
 
1.9%
15 28415
 
1.1%
20 19617
 
0.8%
25 14728
 
0.6%
30 11541
 
0.4%
35 9212
 
0.4%
40 7737
 
0.3%
45 6681
 
0.3%
ValueCountFrequency (%)
25261655 1
< 0.1%
24306165 1
< 0.1%
18325325 1
< 0.1%
18285580 1
< 0.1%
13787630 1
< 0.1%
13248710 1
< 0.1%
12640800 1
< 0.1%
12620855 1
< 0.1%
12196575 1
< 0.1%
12154280 1
< 0.1%

Symbol.1
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2591730
Missing (%)100.0%
Memory size19.8 MiB

Knowledge of official languages (5):French only[3]
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct3255
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean101.06102
Minimum0
Maximum4087895
Zeros2513182
Zeros (%)97.0%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2023-11-04T12:00:43.303194image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4087895
Range4087895
Interquartile range (IQR)0

Descriptive statistics

Standard deviation10027.686
Coefficient of variation (CV)99.224074
Kurtosis79968.312
Mean101.06102
Median Absolute Deviation (MAD)0
Skewness249.26977
Sum2.6192287 × 108
Variance1.0055448 × 108
MonotonicityNot monotonic
2023-11-04T12:00:43.396240image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2513182
97.0%
5 27848
 
1.1%
10 9482
 
0.4%
15 5053
 
0.2%
20 3340
 
0.1%
25 2419
 
0.1%
30 1964
 
0.1%
35 1538
 
0.1%
40 1223
 
< 0.1%
45 1059
 
< 0.1%
Other values (3245) 24622
 
1.0%
ValueCountFrequency (%)
0 2513182
97.0%
5 27848
 
1.1%
10 9482
 
0.4%
15 5053
 
0.2%
20 3340
 
0.1%
25 2419
 
0.1%
30 1964
 
0.1%
35 1538
 
0.1%
40 1223
 
< 0.1%
45 1059
 
< 0.1%
ValueCountFrequency (%)
4087895 1
< 0.1%
4029960 1
< 0.1%
3980275 1
< 0.1%
3925600 1
< 0.1%
3734010 1
< 0.1%
3728020 1
< 0.1%
3638955 1
< 0.1%
3633980 1
< 0.1%
2173390 1
< 0.1%
2142665 1
< 0.1%

Symbol.2
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2591730
Missing (%)100.0%
Memory size19.8 MiB

Knowledge of official languages (5):English and French[4]
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct5473
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean188.13551
Minimum0
Maximum6581680
Zeros2361395
Zeros (%)91.1%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2023-11-04T12:00:43.496535image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile10
Maximum6581680
Range6581680
Interquartile range (IQR)0

Descriptive statistics

Standard deviation13140.444
Coefficient of variation (CV)69.845635
Kurtosis78425.758
Mean188.13551
Median Absolute Deviation (MAD)0
Skewness235.20148
Sum4.8759644 × 108
Variance1.7267127 × 108
MonotonicityNot monotonic
2023-11-04T12:00:43.586307image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2361395
91.1%
5 74834
 
2.9%
10 26072
 
1.0%
15 14684
 
0.6%
20 10096
 
0.4%
25 7504
 
0.3%
30 5904
 
0.2%
35 4752
 
0.2%
40 3829
 
0.1%
45 3483
 
0.1%
Other values (5463) 79177
 
3.1%
ValueCountFrequency (%)
0 2361395
91.1%
5 74834
 
2.9%
10 26072
 
1.0%
15 14684
 
0.6%
20 10096
 
0.4%
25 7504
 
0.3%
30 5904
 
0.2%
35 4752
 
0.2%
40 3829
 
0.1%
45 3483
 
0.1%
ValueCountFrequency (%)
6581680 1
< 0.1%
6130560 1
< 0.1%
5226490 1
< 0.1%
3898980 1
< 0.1%
3766750 1
< 0.1%
3675005 1
< 0.1%
3554305 1
< 0.1%
3419880 1
< 0.1%
3337330 1
< 0.1%
3244350 1
< 0.1%

Symbol.3
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2591730
Missing (%)100.0%
Memory size19.8 MiB

Knowledge of official languages (5):Neither English nor French[5]
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct2600
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.55237
Minimum0
Maximum689725
Zeros2451876
Zeros (%)94.6%
Negative0
Negative (%)0.0%
Memory size19.8 MiB
2023-11-04T12:00:43.680974image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5
Maximum689725
Range689725
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1603.2533
Coefficient of variation (CV)56.151322
Kurtosis63647.582
Mean28.55237
Median Absolute Deviation (MAD)0
Skewness204.02385
Sum74000035
Variance2570421.3
MonotonicityNot monotonic
2023-11-04T12:00:43.770843image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2451876
94.6%
5 52148
 
2.0%
10 18248
 
0.7%
15 9934
 
0.4%
20 6899
 
0.3%
25 4881
 
0.2%
30 3806
 
0.1%
35 2944
 
0.1%
40 2503
 
0.1%
45 2052
 
0.1%
Other values (2590) 36439
 
1.4%
ValueCountFrequency (%)
0 2451876
94.6%
5 52148
 
2.0%
10 18248
 
0.7%
15 9934
 
0.4%
20 6899
 
0.3%
25 4881
 
0.2%
30 3806
 
0.1%
35 2944
 
0.1%
40 2503
 
0.1%
45 2052
 
0.1%
ValueCountFrequency (%)
689725 1
< 0.1%
678580 1
< 0.1%
667955 1
< 0.1%
662420 1
< 0.1%
405555 1
< 0.1%
399290 1
< 0.1%
394210 1
< 0.1%
391515 1
< 0.1%
344545 1
< 0.1%
339275 1
< 0.1%

Symbol.4
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2591730
Missing (%)100.0%
Memory size19.8 MiB

Interactions

2023-11-04T12:00:28.235629image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:23.070400image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:24.410051image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:25.657468image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:26.934434image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:28.493439image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:23.354749image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:24.653660image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:25.915426image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:27.220201image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:28.746300image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:23.615921image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:24.898077image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:26.157779image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:27.474297image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:29.006692image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:23.891197image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:25.154632image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:26.418824image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:27.727955image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:29.250249image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:24.150701image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:25.405759image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:26.675912image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-04T12:00:27.977850image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-11-04T12:00:43.850673image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Knowledge of official languages (5):Total - Knowledge of official languages[1]Knowledge of official languages (5):English only[2]Knowledge of official languages (5):French only[3]Knowledge of official languages (5):English and French[4]Knowledge of official languages (5):Neither English nor French[5]Gender (3)Age (15A)
Knowledge of official languages (5):Total - Knowledge of official languages[1]1.0000.9240.3980.6900.5430.0020.005
Knowledge of official languages (5):English only[2]0.9241.0000.2680.5970.5360.0020.005
Knowledge of official languages (5):French only[3]0.3980.2681.0000.5250.4090.0030.005
Knowledge of official languages (5):English and French[4]0.6900.5970.5251.0000.5420.0010.005
Knowledge of official languages (5):Neither English nor French[5]0.5430.5360.4090.5421.0000.0040.007
Gender (3)0.0020.0020.0030.0010.0041.0000.000
Age (15A)0.0050.0050.0050.0050.0070.0001.000

Missing values

2023-11-04T12:00:30.457781image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-04T12:00:33.127212image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEGEODGUIDGender (3)Age (15A)Mother tongue (331)CoordinateKnowledge of official languages (5):Total - Knowledge of official languages[1]SymbolKnowledge of official languages (5):English only[2]Symbol.1Knowledge of official languages (5):French only[3]Symbol.2Knowledge of official languages (5):English and French[4]Symbol.3Knowledge of official languages (5):Neither English nor French[5]Symbol.4
02021Canada2021A000011124Total - GenderTotal - AgeTotal - Mother tongue1.1.1.136620955NaN25261655NaN4087895NaN6581680NaN689725NaN
12021Canada2021A000011124Total - GenderTotal - AgeSingle responses1.1.1.235145265NaN24306165NaN4029960NaN6130560NaN678580NaN
22021Canada2021A000011124Total - GenderTotal - AgeOfficial languages1.1.1.327296445NaN18325325NaN3734010NaN5226490NaN10620NaN
32021Canada2021A000011124Total - GenderTotal - AgeEnglish1.1.1.420107200NaN18285580NaN5990NaN1806605NaN9025NaN
42021Canada2021A000011124Total - GenderTotal - AgeFrench1.1.1.57189245NaN39740NaN3728020NaN3419880NaN1595NaN
52021Canada2021A000011124Total - GenderTotal - AgeNon-official languages1.1.1.67848820NaN5980845NaN295950NaN904065NaN667955NaN
62021Canada2021A000011124Total - GenderTotal - AgeIndigenous languages1.1.1.7148895NaN123580NaN10995NaN8785NaN5535NaN
72021Canada2021A000011124Total - GenderTotal - AgeAlgonquian languages1.1.1.897125NaN79020NaN10730NaN5625NaN1760NaN
82021Canada2021A000011124Total - GenderTotal - AgeBlackfoot1.1.1.92520NaN2480NaN0NaN25NaN10NaN
92021Canada2021A000011124Total - GenderTotal - AgeCree-Innu languages1.1.1.1067665NaN51030NaN10405NaN4780NaN1455NaN
REF_DATEGEODGUIDGender (3)Age (15A)Mother tongue (331)CoordinateKnowledge of official languages (5):Total - Knowledge of official languages[1]SymbolKnowledge of official languages (5):English only[2]Symbol.1Knowledge of official languages (5):French only[3]Symbol.2Knowledge of official languages (5):English and French[4]Symbol.3Knowledge of official languages (5):Neither English nor French[5]Symbol.4
25917202021Nunavut2021A000262Women+75 years and overEstonian174.3.15.3220NaN0NaN0NaN0NaN0NaN
25917212021Nunavut2021A000262Women+75 years and overFinnish174.3.15.3230NaN0NaN0NaN0NaN0NaN
25917222021Nunavut2021A000262Women+75 years and overHungarian174.3.15.3240NaN0NaN0NaN0NaN0NaN
25917232021Nunavut2021A000262Women+75 years and overOther languages, n.i.e.174.3.15.3250NaN0NaN0NaN0NaN0NaN
25917242021Nunavut2021A000262Women+75 years and overMultiple responses174.3.15.32615NaN15NaN0NaN0NaN0NaN
25917252021Nunavut2021A000262Women+75 years and overEnglish and French174.3.15.3270NaN0NaN0NaN0NaN0NaN
25917262021Nunavut2021A000262Women+75 years and overEnglish and non-official language(s)174.3.15.32815NaN15NaN0NaN0NaN0NaN
25917272021Nunavut2021A000262Women+75 years and overFrench and non-official language(s)174.3.15.3290NaN0NaN0NaN0NaN0NaN
25917282021Nunavut2021A000262Women+75 years and overEnglish, French and non-official language(s)174.3.15.3300NaN0NaN0NaN0NaN0NaN
25917292021Nunavut2021A000262Women+75 years and overMultiple non-official languages174.3.15.3310NaN0NaN0NaN0NaN0NaN